# Assignment 3 by Jonathan "JONO" O'Dowd (jodowd@umich.edu)
# SIADS 521: Visual Exploration of Data
# Professor: Christopher Brooks
# Date of submission: 11/24/2019
# Import tools
from IPython.core.display import display, HTML; display(HTML("<style>.container { width:99% !important; }</style>"))
%matplotlib inline
# (20%) Are you making a compelling computational narrative, judged in part by Rule et al’s ten rules for computational analyses?
# You don’t need to follow all of the rules all of the time, but you must explicitly indicate at the header of each notebook which rules you adhered to and what the evidence was.
# As described in grading rubric #1, it is expected that you will provide a narrative of how you adhered to or interpreted several of Rule et al's heuristics for computational narratives.
# While there are no hard limits on the number of rules you should address, I expect at least three rules would be able to be discussed for a notebook of this size, and that discussion
# and evidence of how you aligned with those rules would be on the order of 1-2 paragraphs per rule.
print("Assignment 3 by Jonathan 'JONO' O'Dowd (jodowd@umich.edu)")
print("")
print("Compelling Computational Narrative (20%)")
# Rule 1: Tell the story. Not just what or how you did it but why. How steps are connected and what it all means.
# Rule 2: make sure to document all your explorations, even (or perhaps especially) those that led to dead ends.
# Rule 3: Use cell divisions to make each step clear. Each cell should accomplish one task or have one main purpose.
# Rule 5: Record dependencies. Show what was required for particular technique and/or library to work properly. Describe other tools used like Anaconda's Command Prompt for pre-requisite installation.
# These below were from assignment 2. Browse through them and use whatever may seem relevant this time around. Since it's my thoughts on same rules, no sense in re-inventing the wheel.
print("")
print("Response below on how I followed: Rule 1: Tell the story. Not just what or how you did it but why. How steps are connected and what it all means.")
print("This final assignment was the BEST ONE yet! I thoroughly enjoyed the idea of Professor Brooks being my client. It allowed me to start from the very beginning of taking his personal")
print("data set, restructuring the data for acceptance into a data visualization tool and data library and then presenting the final product in such a way to tell the story behind")
print("Professor Brooks exercise data and give him an analysis of what it could possibly mean for him in terms of exercise program and best practices for optimal health.")
print("I believe I followed Rule #1 of telling the story because I included most of my thoughts in trying to accomplish this overall task and I tried to communicate them")
print("in such a way to speak in simple, concise terms that another student in our cohort could learn about the visualization tools and library I chose. I believe most everyone in our")
print("class will be able to follow my narrative because I spoke from a perspective of a novice. I didn't use fancy terminology nor did I pick a path that was highly advanced. Because")
print("I am new to data science and programming, it is best for me to approach things at a very basic level. I chose certain techniques such as scatter plot, violin plot, and the heat map because they fit")
print("the data well in his csv file. I also attempted a slightly more difficult, advanced technique such as the geographical mapping one as it fit coordination of his running/cycling paths well.")
print("I mostly used Seaborn and Plotly libraries as they were the most simplistic approach requiring less dependencies as it works closely as a continuation of matplotlib and pandas. The steps I")
print("used are all connected and well documented to show the beginning of the data visualization process to its final visualization end. I thoroughly believe one could take this notebook and easily")
print("follow the procedure and end up with the same or perhaps better outcome.")
print("")
print("Response below on how I followed: Rule 2: Make sure to document all your explorations, even (or perhaps especially) those that led to dead ends.")
print("Rule #2 was followed in this notebook. As they say, the 'proof is in the pudding'. If one reads through this notebook, he/she will encounter 'the good, the bad, & the ugly'. I didn't")
print("try to pretend to be some superhero data scientist. I am a novice and it shows. I communicated this from the beginning and documented all my steps from failure to success. The failure")
print("portion was primarily attempting to go down the path of Altair as my library choice. I had to abandon this path after spending so many hours trying to get past the dependencies and then")
print("likewise the data tidying process. I just could not get it to work. Seaborn and Plotly are both heroes to me. They really bailed me out in helping me to get started in another library without being")
print("too technical or 'over my head'.")
print("")
print("Response below on how I followed: Rule 3: Use cell divisions to make each step clear. Each cell should accomplish one task or have one main purpose.")
print("Rule #3 involves using cell divisions appropriately to communicate what process was followed. Each cell I used in this notebook served a purpose. The first few steps were merely used")
print("for documentation purposes to introduce this assignment and to explain what was being done and why. The final few steps utilized individual cells to communicate each visual output.")
print("")
print("Response below on how I followed: Rule 5: Record dependencies. Show what was required for particular technique and/or library to work properly. Describe other tools used.")
print("While I didn't end up using Altair, I will proceed to explain what pre-requisite steps are needed since I gained knowledge through my mistakes and believe others can learn from these errors.")
print("Take my steps, however, with a 'grain of salt' keeping in mind that my attempt to use Altair failed. The following are items required before Altair could be used:")
print("Step 1: Use Anaconda's command prompt (python console) to install Altair via this command: pip install -U altair vega_datasets jupyterlab.")
print("Step 2: After installing Altair, you should receive this message in the console: 'Successfully installed altair-3.2.0 jupyterlab-1.2.3 vega-datasets-0.7.0. Note: you may need to")
print("restart the kernel to use updated packages.'")
print("Step 3: In addition to import of altair as alt and import pandas as pd, you must also include the following: alt.renderers.enable('notebook')")
print("Step 4: I had to also run this from the console as vega was required: pip install vega==1.3.")
print("Step 5: Data Frame must be in Tidy format to work with Altair http://bebi103.caltech.edu.s3-website-us-east-1.amazonaws.com/2018/tutorials/t2a_tidy_data.html .")
print("My problem with Altair was mostly with Step 5's attempt to put the data in 'tidy' format to work with its visualization techniques. I know it was mostly my attempt to format the data")
print("rather than missing certain dependencies because I was able to get manually inserted values to work with Altair's version of scatterplot, for example. The failure occurred when trying")
print("to use my data set with it; just could not get it to work.")
print("After hours and hours of trouble with Altair, I abandoned the effort and went to Seaborn. I have a strong affinity for Seaborn regarding pre-requisites in terms of low overhead. Because")
print("it works so closely with matplotlib, there really aren't too many 'hoops' to jump through prior to its application. For this purpose, all that I found to be required was the following:")
print("import matplotlib.pyplot as plt, import pandas as pd, and import seaborn as sns.")
print("I also used PLOTLY quite a bit. I found that it had so few dependencies. It was just like Seaborn. You mostly just had to import it with your normal imports of pandas and matplotlib etc.")
print("One last note to add has to do with a failure of my graphs done using PLOTLY do NOT show up in the exported html file. I remembered Chris Thorne posted something about this on last assignment.")
print("So I took his advice and added the following code to correct this issue: import plotly.offline as py and py.init_notebook_mode(connected=False) and py.iplot(fig). However, that didn't work and")
print("so I just kept trying the download and eventually it worked out on its own. What I think resolved it was that I restarted the kernel and re-ran the ipynb file again and then downloaded as html.")
# (45%) Have you demonstrated that you have a solid grasp of at least three of the basic visual analysis techniques in this class (scatter, box, line, violin, histograms, heatmaps, probability plots, treemaps, sploms)
# and that they were appropriate for the analysis/data you were investigating?
# You get equal grades for each plot type (15% each), and grades for a given plot will be broken down into three equal categories (5% each):
# The mechanics of generating a reasonable plot from the data you are working with.
# The justification for the plot and the insight as a result, as described by your computational narrative.
# Making the plot rock visually, by embedding advanced features ranging from the aesthetic (color, form) to the informational (callouts, annotations).
print("See computational narrative cell divisions further below to end of this notebook. You will see I followed the instructions and met all of these requirements in this section.")
# (20%) Are you able to provide an interesting and defensible analysis that helps me understand what this data means in the context of my activities? Think of me as the client --
# if your data science discovery makes me happy then this part of the overall grade tilts up towards 20%. If I think there are obvious things you should have looked at then it tilts down towards 0%.
print("For this requirement, I included an analysis above each plot marked: 'Analysis for Professor Brooks' to communicate what the data means as seen through the visual plotting. Scroll below all the way to end of notebook taking note of each one along the way.")
# Attempt to make sense of strava.csv data and output visuals helpful to Professor Brooks. This first attempt is to hone in on the 'cadence' field values and report in terms of recommended cycling rpm vs
# Prof Brooks rpm and then in terms of recommended running steps per min vs Prof Brooks actual steps per min. This would be helpful to him to learn to adjust his peddling and steps for max results
# or safer(staying healthy) results.
# Import tools
from IPython.core.display import display, HTML; display(HTML("<style>.container { width:99% !important; }</style>"))
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
# Read source files
data_cycle=pd.read_csv("strava.csv",index_col='timestamp',usecols=['Cadence','timestamp','cadence'])
# Cleanup data to prepare only what's relevant for visual output
data_cycle=data_cycle.dropna()
# Calculate Professor Brooks CYCLING Cadence mean.
print("Interesting facts about Professor Brooks Cycling Cadence and Running Cadence:")
print("Professor Brooks CYCLING Cadence Mean",data_cycle['cadence'].mean())
# Output of data for visual communication
data_cycle.describe()
data_cycle
# Citation Source Information and Quote:
# "While there's no one magic number, aiming for 90 RPM is a good goal to avoid leg fatigue and
# making the most out of those slow-twitch muscles. Average cyclists have a cadence of about 60 RPM; advanced and elite cyclists pedal anywhere from 80 to 100 RPMs."
# Fitness, Wahoo. “Cycling Cadence: What Is It & How to Improve Yours.” Wahoo Fitness Blog, 27 Sept. 2019, https://blog.wahoofitness.com/cycling-cadence-what-is-it-how-to-improve-yours/.
# Citation below is site which helped with coding suggestions
# Hunter, John. "Annotating Axes." Matplotlib, 21 Nov. 2019, http://lagrange.univ-lyon1.fr/docs/matplotlib/users/annotations_guide.html.
# Source for Violin Plot guidance - https://stackoverflow.com/questions/31594549/how-do-i-change-the-figure-size-for-a-seaborn-plot
# SEABORN Violin Plot of Professor Brooks' CYCLING Cadence vs Ideal Cycling Cadence
sns.set_style('ticks')
fig, ax = plt.subplots()
fig.set_size_inches(15, 8)
data_cycle=data_cycle.rename(columns={"cadence": "Prof Brooks Cyc Cadence","timestamp":"timestamp","Cadence":"Prof Brooks Run Cadence"})
sns.violinplot(data=data_cycle, inner="points", x="Prof Brooks Cyc Cadence",ax=ax)
sns.despine()
plt.title('Professor Brooks CYCLING Cadence vs Ideal Cycling Cadence')
plt.xlabel('Professor Brooks CYCLING Cadence (RPMs)')
# Set theme
sns.set_style('whitegrid')
bbox_props=dict(boxstyle="rarrow,pad=0.3", fc="white", ec="b", lw=2)
a=ax.text(54, 0, "Professor Brooks Average CYCLING Cadence 77.75", ha="center", va="center", rotation=360,
size=15,
bbox=bbox_props)
bbox_props=dict(boxstyle="larrow,pad=0.3", fc="yellow", ec="b", lw=4)
b=ax.text(118, 0, "Ideal Target CYCLING Cadence for Professor Brooks 90.00", ha="center", va="center", rotation=360,
size=15,
bbox=bbox_props)
fig.savefig('violin.png')
# Read source files
data_run=pd.read_csv("strava.csv",index_col='timestamp',usecols=['Cadence','timestamp','cadence'])
# Cleanup data to prepare only what's relevant for visual output
data_run=data_run.dropna()
# Calculate Professor Brooks RUNNING Cadence based
print("Professor Brooks RUNNING Cadence Mean",data_run['Cadence'].mean())
# SEABORN Violin Plot of Professor Brooks' RUNNING Cadence vs Ideal Running Cadence
sns.set_style('ticks')
fig, ax=plt.subplots()
fig.set_size_inches(15, 8)
sns.violinplot(data=data_run, inner="points", x="cadence",color='cyan',ax=ax)
sns.despine()
plt.title('Professor Brooks RUNNING Cadence vs Ideal Running Cadence')
plt.xlabel('Professor Brooks RUNNING Cadence (SPMs)')
# Set theme
sns.set_style('whitegrid')
bbox_props=dict(boxstyle="rarrow,pad=0.3", fc="white", ec="b", lw=2)
a=ax.text(53, 0, "Professor Brooks Average RUNNING Cadence 77.73", ha="center", va="center", rotation=360,
size=15,
bbox=bbox_props)
bbox_props=dict(boxstyle="larrow,pad=0.3", fc="yellow", ec="b", lw=4)
b=ax.text(170, 0, "Ideal Target RUNNING Cadence for Professor Brooks 140.00", ha="center", va="center", rotation=360,
size=15,
bbox=bbox_props)
fig.savefig('violin2.png')
# Comments and Analysis of input file and output and what this means for Professor Brooks exercise program and best practices for optimal health
print("")
print("First Simple Plot Requirement: Violin Plot used")
print("")
print("Justification for picking the violin plot was:")
print("It works to graphically represent all at once various components of each data record such as the highest point, the lowest point, the mean, and probability density. I specifically used it now though")
print("to point out the MEAN portion of Professor Brooks cycling cadence and running cadence to make a visual point with annotative labeled arrows pointing out where his average cadence was versus where the")
print("ideal cadence should be for optimal health and performance.")
print("")
print("See analysis below on what this means for Professor Brooks exercise program and best practices for optimal health.")
print("")
print("NOTE: There were TWO separate CADENCE fields spelled the same with one exception in the strava.csv file. The exception was one started with a capital 'C'.")
print("I noticed the first cadence field started with capital 'C' and was clearly right next to the other RUNNING data fields so I assumed it was cadence for running.")
print("The other cadence field started with lower case 'c' and was clearly right next to the CYCLING fields, and so I used it for calculating the cycling cadence.")
print("")
print("Analysis for Professor Brooks:")
print("According to cycling experts, one should aim for 90 RPM (Revolutions Per Minute) as a good goal to avoid leg fatigue and injury. Professor Brooks is averaging 77.75 RPM and so is not too far off the ideal goal.")
print("According to running experts, one should aim for 140 SPM (Steps Per Minute) for optimal health. According to the strava.csv data running cadence field, Professor Brooks is averaging 77.73 SPM which is way off")
print("the 140 SPM goal and needs to shorten his stride to get more steps in to ensure less bounce which ensures less stress on the hips and knees.")
# Scatter plots
# Import tools
from IPython.core.display import display, HTML; display(HTML("<style>.container { width:99% !important; }</style>"))
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import plotly.graph_objects as go
#import plotly.offline as py
#py.init_notebook_mode(connected=False)
#py.iplot(fig)
# Form power is defined as 'running in place' power
# Leg spring stiffness is a measure of how flexible or stiff legs are
# There are theories suggesting that Leg Spring Stiffness (LSS) is correlated to Form Power (FP).
# Assuming that it is, let's see where what the data tells us about Professor Brooks' Leg Spring Stiffness (LSS) and how it is translating into Form Power (FP) for him.
print("")
print("Second Simple Plot Requirement: Scatter Plot used")
print("")
print("Justification for picking the scatter plot is below:")
print("The scatter plot accurately and visually depicted my data set due to the quantitative nature of the data. Scatterplot was perfect to represent the numbers of 'Leg Spring Stiffness' and 'Form Power' over a period of time.")
print("This is just what scatterplot does best; depicting numerical statistics over a period of time.")
print("")
print("Analysis for Professor Brooks:")
print("Professor Brooks, See graphs below on Leg Spring Stiffness (LSS) Vs. Form Power (FP). Your LSS is very consistent in values in the small period of time observed below. This means your running form is very steady,")
print("and you aren't wasting energy as you make your strides forward. What's alarming is the fluctuation in form power. It fluctuates too much. To be honest, I don't know what that means except I believe there is something")
print("there for you to investigate with a trainer. Why is your Form Power fluctuating so much? Typically LSS plays a part in Form Power. All I can say is that your LSS looks good and so there must be something else causing")
print("your FP to vary so much.")
# Read data file
df=pd.read_csv("strava.csv",usecols=['timestamp','Leg Spring Stiffness','Form Power'])
# Cleanup data to prepare only what's relevant for visual output
df=df.dropna()
# Filter specific data based on particular day
df['timestamp']=pd.to_datetime(df['timestamp'])
start_date='8/17/2019 0:00'
end_date='8/17/2019 22:00'
mask=(df['timestamp'] >= start_date) & (df['timestamp'] <= end_date)
df=df.loc[mask]
# Using PLOTLY to graph LSS vs FORM POWER
# Citing source of help: https://plot.ly/python/plot-data-from-csv/
fig1=go.Figure(go.Scatter(x = df['timestamp'], y = df['Leg Spring Stiffness'],
name='LSS'))
fig2=go.Figure(go.Scatter(x = df['timestamp'], y = df['Form Power'],
name='FP'))
fig1.update_layout(title='LSS vs FP',
plot_bgcolor='rgb(130, 230,230)',
showlegend=True)
fig1.show()
fig2.update_layout(title='',
plot_bgcolor='rgb(255, 100,10)',
showlegend=True)
fig2.show()
# NOTE - failures: tried Seaborn Scatter plot and Plotly Bubble Scatter Plot; neither worked real well to display anything of significance
# Import tools
from IPython.core.display import display, HTML; display(HTML("<style>.container { width:99% !important; }</style>"))
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import plotly.express as px
print("")
print("Second Simple Plot Requirement: Scatter Plot used")
print("")
print("Justification for picking the scatter plot is below:")
print("The scatter plot accurately and visually depicted my data set due to the quantitative nature of the data. Scatterplot was perfect to represent the numbers of 'Heart Rate' and 'Distance' over a period of time.")
print("This is just what scatterplot does best; depicting numerical statistics over a period of time.")
print("")
print("Analysis for Professor Brooks:")
print("Professor Brooks, See the Correlation of Distance and Heart Rate Scatter Plot graph below. I am noticing about 5 main patterns in your exercise routine. Not sure if this is your plan, but it appears you are")
print("fluctuating your exercise routine to reach an average heart rate level on different occasions. For your best health and safety in training, you need to do two main things:")
print("1. Determine what your maximum heart rate should be using this formula: 220 - age = Max Heart Rate (MHR).")
print("2. Once you know what your ideal maximum heart rate should be, then you can plan your runs/cycling to reach a certain average percentage of it. For example, when running, you should train at 50 to 85 percent of your maximum heart rate.")
print("If your heart rate dips below the 50 to 85 percent level, then you may want to increase your pace to get better results. However, if your heart rate goes above the maximum, you need to go at a slower pace.")
print("Hypothetical example: Let's say your target maximum heart rate based on your age using the formula mentioned above is 185. 50% of 185 is 92.5 and 85% of 185 is 157.25. So in that case, your goal would be to keep your average heart rate")
print("at a minimal of 92.5 but no more than 157.25. Looking at the plot below most of your routines are leading to an average heart rate slightly above the 157.25. Only a few of your routines seem to be well within your range.")
print("Based on this chart alone, I would caution you to take it easy on your exercise routines to keep your average heart rate slightly lower. You appear to be an over-achiever type and may be overdoing it. Am I right? Please give this")
print("some serious thought. We want to keep you around a lot longer. ;-)")
print("")
print("One thing I want to congratulate you on as seen between the correlation of distance and average heart rate is the consistency of keeping your heart rate close to the same average during a particular routine; this probably indicates your")
print("ability to keep a steady pace; that's a healthy way to train. Keep up the good work on maintaining a steady pace; again though, just make sure you take it easy to keep that heart rate down slightly.")
# Cite source of help - https://www.healthline.com/health/running-heart-rate
# Read data file
data=pd.read_csv("strava.csv",usecols=['timestamp','Air Power','Power','Vertical Oscillation','distance','heart_rate','enhanced_altitude'])
# Cleanup data to prepare only what's relevant for visual output
data=data.dropna()
# Filter specific data based on particular day
data['timestamp']=pd.to_datetime(data['timestamp'])
start_date='8/21/2019 0:00'
end_date='8/31/2019 23:00'
mask=(data['timestamp'] >= start_date) & (data['timestamp'] <= end_date)
data=data.loc[mask]
# Use PLOTLY SCATTER PLOT TECHNIQUE to communicate correlation between distance and heart rate for Professor Brooks
fig1 = px.scatter(data, x=data['distance'], y=data['heart_rate'])
fig1.update_layout(title='Correlation of Distance and Heart Rate for Professor Brooks',
plot_bgcolor='rgb(199, 230,150)',
showlegend=True)
fig1.show()
# Scatter plot
# Import tools
from IPython.core.display import display, HTML; display(HTML("<style>.container { width:99% !important; }</style>"))
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import plotly.express as px
# Read data file
data=pd.read_csv("strava.csv",usecols=['timestamp','cadence','heart_rate'])
print("")
print("Second Simple Plot Requirement: Scatter Plot used")
print("")
print("Justification for picking the scatter plot was:")
print("The scatter plot accurately and visually depicted my data set due to the quantitative nature of the data. Scatterplot was perfect to represent the numbers of 'Heart Rate' and 'Cycling Cadence' over a period of time.")
print("This is just what scatterplot does best; depicting numerical statistics over a period of time.")
print("")
print("Analysis for Professor Brooks:")
print("Professor Brooks, See the Correlation of Cycling Cadence and Heart Rate Scatter Plot graph below. I am noticing your average cycling cadence of 77.75 is causing your heart rate to be well above the ideal target heart rate range.")
print("According to https://www.road-bike.co.uk/articles/cyclingcadence.php , your average cycling cadence should be higher at 90 to 100 RPM. The reasoning behind this is that the faster you pedal, the less stress on your muscles as each turn")
print("requires less push and thus as a result the heart has less work to do and lowers your heart rate closer to a healthier target heart rate range. There are way fewer dots below in the 90 to 100 RPM range. Notice though when you do reach 90 to 100 RPM,")
print("how much lower your average heart rate is? Please consider increasing your cycling cadence as it will be easier on your heart allowing you to continue to exercise in a healthy and safe manner.")
# Use PLOTLY SCATTER PLOT TECHNIQUE to communicate correlation between Running Cadence and heart rate for Professor Brooks
fig2 = px.scatter(data, x=data['cadence'], y=data['heart_rate'])
fig2.update_layout(title='Correlation of Cycling Cadence and Heart Rate for Professor Brooks',
plot_bgcolor='rgb(230, 230,150)',
showlegend=True)
fig2.show()
# SEABORN HEAT plot
print("SEABORN HEAT MAP")
# Import tools
from IPython.core.display import display, HTML; display(HTML("<style>.container { width:99% !important; }</style>"))
%matplotlib inline
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Summary and Analysis for Professor Brooks
print("")
print("Third Simple Plot Requirement: HEAT MAP used")
print("")
print("Justification for picking the HEAT MAP was:")
print("The heat map is used to depict certain truths in a visual way using colors; for example, a progression from lighter to darker colors can represent a certain degree of increasing")
print("intensity or increasing value. In my example below, I needed a way to show Professor Brooks visually how his running cadence was the perfect pattern at one point. It was increasing")
print("over time which in turn caused his vertical oscillation to decrease and thereby increased his overall power. The heat maps were the perfect visualization tool to show step by step how these")
print("three field values are correlated in that one causes the outcome for another which causes the final outcome leading to more power for Professor Brooks.")
print("")
print("Analysis for Professor Brooks:")
print("Professor Brooks, See the 3 HEAT MAPS below. It is proposed among experts in the running community that increasing your steps per minute (cadence) will")
print("cause less bounce (vertical oscillation) and thus increase your overall power as a runner. Looking at a 10 second interval of one of your runs, I discovered a prime example of this theory")
print("where you increased your running cadence which in turn decreased your vertical oscillation and thereby, caused in increase in power for you during that 10 second interval.")
print("So take a look at the heat maps below which illustrate that very thought, and make a note to try and increase your overall running cadence as I mentioned previously in the violin plot. This will help your form with less bounce")
print("and give you more power during your runs.")
print("")
# Read data file
data=pd.read_csv("strava.csv",usecols=['timestamp','Cadence','Vertical Oscillation','Power'])
#data=pd.read_csv("strava.csv",usecols=['timestamp','Air Power','Power','Vertical Oscillation','distance','heart_rate','enhanced_altitude'])
# Cleanup data to prepare only what's relevant for visual output
data=data.dropna()
# Filter specific data based on particular day
data['timestamp']=pd.to_datetime(data['timestamp'])
start_date='8/18/2019 20:41:48'
end_date='8/18/2019 20:41:58'
mask=(data['timestamp'] >= start_date) & (data['timestamp'] <= end_date)
data=data.loc[mask]
# HEAT map specific code
print(data)
# As your Running Cadence increases, you are taking more steps per minute which decreases Vertical Oscillation (less bounce) which in turn increases Power!
# Below is HEAT MAP Showing increasing Running Cadence from 20:41:48 to 20:41:58 on 08-18-2019
result = data.pivot(index='Cadence', columns='timestamp', values='Cadence')
sns.heatmap(result, annot=True, fmt="g", cmap='coolwarm')
plt.show()
# Below is HEAT MAP Showing decrease of Vertical Oscillation due to increasing Running Cadence from 20:41:48 to 20:41:58 on 08-18-2019
result = data.pivot(index='Vertical Oscillation', columns='timestamp', values='Vertical Oscillation')
sns.heatmap(result, annot=True, fmt="g", cmap='coolwarm')
plt.show()
# Below is HEAT MAP Showing increase of POWER due to decreasing Vertical Oscillation from 20:41:48 to 20:41:58 on 08-18-2019
result = data.pivot(index='Power', columns='timestamp', values='Power')
sns.heatmap(result, annot=True, fmt="g", cmap='coolwarm')
plt.show()
# (15%) Have you demonstrated that you have a solid grasp of at least one of the more advanced visual analysis techniques in this class (time series, 3d, geographic/mapping, spatial)
# and that it was appropriate for the analysis/data you were investigating? The grading rubric is the same as the basic plots. You may use other advanced plots with permission in this
# category (ask first to ensure they seem reasonably advanced).
# Import tools
from IPython.core.display import display, HTML; display(HTML("<style>.container { width:99% !important; }</style>"))
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import pandas as pd
# Read data file
df=pd.read_csv("strava.csv",usecols=['timestamp','position_lat','position_long','altitude','heart_rate'])
# Overall Max values observed
max=df['heart_rate'].max()
# Cleanup data to prepare only what's relevant for visual output
df=df.dropna()
df=df.sort_values('timestamp')
# Filter specific data based on particular day
df['timestamp']=pd.to_datetime(df['timestamp'])
start_date='7/18/2019 0:00'
end_date='7/18/2019 23:00'
mask=(df['timestamp'] >= start_date) & (df['timestamp'] <= end_date)
df=df.loc[mask]
# Max values observed for 7/18/2019
max3=df['heart_rate'].max()
max4=df['heart_rate'].min()
max5=df['heart_rate'].mean()
max6=df['altitude'].max()
max7=df['altitude'].min()
max8=df['altitude'].mean()
# Conversion of latitude and longitude values into decimal format
df["position_lat"]=df["position_lat"] * ( 180 / 2**31 )
df["position_long"]=df["position_long"] * ( 180 / 2**31 )
# Analysis relevant to meet requirement
print("ADVANCED PLOT REQUIREMENT - used Geographical Mapping")
print("")
print("Reason for picking the geographical mapping plot as my advanced plot choice was: ")
print("It allowed me to visually show Professor Brooks his actual exercise paths taken on a map of the United States, specifically the exact latitude and longitude coordinates at Ann Arbor, MI.")
print("It was also the perfect illustrator in that each plotted point of his cycling or running path shows the coordinates, the altitude, and even heart rate at that particular point in time.")
print("One other really cool feature is that it allows one to zoom into Ann Arbor, MI using the mouse wheel for a closer look at his exercise path and by using the mouse pointer, one can hover over")
print("each plotted point and see what Professor Brooks heart rate was at that point in time and elevation.")
print("")
print("Interesting facts obtained while querying and setting up the Geographical Mapping on Professor Brooks exercise stats:")
print("Professor Brooks overall maximum 'max' heart rate:",max,"Beats Per Minute 'BPM'")
print("Professor Brooks max heart rate on 7/18/19:",max3,"BPM")
print("Professor Brooks min heart rate on 7/18/19",max4,"BPM")
print("Professor Brooks avg heart rate on 7/18/19",max5,"BPM")
print("Professor Brooks max altitude reached on 7/18/19:",max6)
print("Professor Brooks min altitude reached on 7/18/19",max7)
print("Professor Brooks avg altitude on 7/18/19",max8)
print("")
print("Analysis for Professor Brooks:")
print("I noticed the geographical mapping is showing Professor Brooks only reached a maximum heart rate of 135.0 Beats Per Minute (BPM) on the mapped day of 7/18/19.")
print("According to strava.csv, his max heart rate reached during exercise was 183 BPM. Let's just assume that the formula states that should be his target max heart rate.")
print("It has been said that one should train at 50 to 80 percent of one's target max heart rate. If Professor Brooks target max is the 183, that means 50 to 80 percent of that rate")
print("would make a range of 91.5 and 146.40 BPM. Judging only from the geographical mapping chart below for 7/18/19, it appears his minimal heart rate was just slightly below the target minimum")
print("and his maximum heart rate was slightly under his target maximum. This was a healthy exercise day for Professor Brooks where his average heart rate was 115.56 falling right in line with the")
print("recommended range for him.")
# Used PLOTLY library to help plot the advance geographical mapping
# Cite source of help: https://plot.ly/python/scatter-plots-on-maps/#customize-geographical-scatter-plot
df['details']=df['altitude'].astype(str)+","+df['heart_rate'].astype(str)
fig=go.Figure(data=go.Scattergeo(
lat=df['position_lat'],
lon=df['position_long'],
mode='markers',
text=df['details']
))
fig.update_layout(
title='Geographical Mapping of Professor Brooks Daily Exercise Path on 7/18/19<br>(place mouse pointer over points for more details on latitude, longitude, altitude and heart rate)<br>(use mouse scroll wheel to zoom into Ann Arbor, MI)',
geo_scope='usa',
)
fig.show()
print("This concludes my presentation.")
print("Thank you for taking the time to evaluate it!")